Search results for "Bellman equation"

showing 10 items of 26 documents

Online fitted policy iteration based on extreme learning machines

2016

Reinforcement learning (RL) is a learning paradigm that can be useful in a wide variety of real-world applications. However, its applicability to complex problems remains problematic due to different causes. Particularly important among these are the high quantity of data required by the agent to learn useful policies and the poor scalability to high-dimensional problems due to the use of local approximators. This paper presents a novel RL algorithm, called online fitted policy iteration (OFPI), that steps forward in both directions. OFPI is based on a semi-batch scheme that increases the convergence speed by reusing data and enables the use of global approximators by reformulating the valu…

0209 industrial biotechnologyInformation Systems and ManagementRadial basis function networkArtificial neural networkComputer sciencebusiness.industryStability (learning theory)02 engineering and technologyMachine learningcomputer.software_genreManagement Information Systems020901 industrial engineering & automationArtificial IntelligenceBellman equation0202 electrical engineering electronic engineering information engineeringBenchmark (computing)Reinforcement learning020201 artificial intelligence & image processingArtificial intelligencebusinesscomputerSoftwareExtreme learning machineKnowledge-Based Systems
researchProduct

Opinion dynamics in social networks through mean field games

2016

Emulation, mimicry, and herding behaviors are phenomena that are observed when multiple social groups interact. To study such phenomena, we consider in this paper a large population of homogeneous social networks. Each such network is characterized by a vector state, a vector-valued controlled input, and a vector-valued exogenous disturbance. The controlled input of each network aims to align its state to the mean distribution of other networks' states in spite of the actions of the disturbance. One of the contributions of this paper is a detailed analysis of the resulting mean-field game for the cases of both polytopic and $mathcal L_2$ bounds on controls and disturbances. A second contrib…

0209 industrial biotechnologyeducation.field_of_studyControl and OptimizationDisturbance (geology)Applied MathematicsPopulation020206 networking & telecommunications02 engineering and technologyState (functional analysis)020901 industrial engineering & automationMean field theoryControl theoryBellman equationConvergence (routing)0202 electrical engineering electronic engineering information engineeringSpiteHerdingOpinion DynamicsSettore MAT/09 - Ricerca OperativaeducationMathematics
researchProduct

Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations

2020

Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents…

0209 industrial biotechnologyreinforcement learningComputer scienceGeneral Mathematics02 engineering and technologypedestrian simulationTask (project management)learning by demonstration020901 industrial engineering & automationAprenentatgeInformàticaBellman equation0202 electrical engineering electronic engineering information engineeringComputer Science (miscellaneous)Reinforcement learningEngineering (miscellaneous)business.industrycausal entropylcsh:MathematicsProcess (computing)020206 networking & telecommunicationsFunction (mathematics)inverse reinforcement learninglcsh:QA1-939Problem domainTable (database)Artificial intelligenceTemporal difference learningbusinessoptimizationMathematics
researchProduct

Adaptive dual control in one biomedical problem

2003

In this paper, the following biomedical problem is considered. People are subjected to a certain chemotherapeutic treatment. The optimal dosage is the maximal dose for which an individual patient will have toxicity level that does not cross the allowable limit. We discuss sequential procedures for searching the optimal dosage, which are based on the concept of dual control and the principle of optimality. According to the dual control theory, the control has two purposes that might be conflicting: one is to help learning about unknown parameters and/or the state of the system (estimation); the other is to achieve the control objective. Thus the resulting control sequence exhibits the closed…

Adaptive controlControl (management)Theoretical Computer ScienceDual (category theory)Control and Systems EngineeringControl theoryBellman equationComputer Science (miscellaneous)Dual control theoryA priori and a posterioriCyberneticsLimit (mathematics)Engineering (miscellaneous)Social Sciences (miscellaneous)MathematicsKybernetes
researchProduct

Emergent Collective Behaviors in a Multi-agent Reinforcement Learning Pedestrian Simulation: A Case Study

2015

In this work, a Multi-agent Reinforcement Learning framework is used to generate simulations of virtual pedestrians groups. The aim is to study the influence of two different learning approaches in the quality of generated simulations. The case of study consists on the simulation of the crossing of two groups of embodied virtual agents inside a narrow corridor. This scenario is a classic experiment inside the pedestrian modeling area, because a collective behavior, specifically the lanes formation, emerges with real pedestrians. The paper studies the influence of different learning algorithms, function approximation approaches, and knowledge transfer mechanisms on performance of learned ped…

Collective behaviorFunction approximationbusiness.industryComputer scienceBellman equationVector quantizationProbabilistic logicReinforcement learningArtificial intelligencebusinessTransfer of learningKnowledge transferSimulation
researchProduct

TUG-OF-WAR, MARKET MANIPULATION, AND OPTION PRICING

2014

We develop an option pricing model based on a tug-of-war game involving the the issuer and holder of the option. This two-player zero-sum stochastic differential game is formulated in a multi-dimensional financial market and the agents try, respectively, to manipulate/control the drift and the volatility of the asset processes in order to minimize and maximize the expected discounted pay-off defined at the terminal date $T$. We prove that the game has a value and that the value function is the unique viscosity solution to a terminal value problem for a partial differential equation involving the non-linear and completely degenerate parabolic infinity Laplace operator.

Computer Science::Computer Science and Game TheoryEconomics and EconometricsPartial differential equationComputer scienceApplied Mathematics010102 general mathematicsMathematicsofComputing_NUMERICALANALYSISBlack–Scholes model01 natural sciences010101 applied mathematicsTerminal valueValuation of optionsAccountingInfinity LaplacianBellman equationDifferential game0101 mathematicsViscosity solutionMathematical economicsSocial Sciences (miscellaneous)FinanceMathematical Finance
researchProduct

Stochastic multicriteria acceptability analysis using the data envelopment model

2006

Abstract Data envelopment analysis (DEA) and stochastic multicriteria acceptability analysis (SMAA-2) are methods for evaluating alternatives based on multiple criteria. While DEA is mainly an ex-post tool used for classifying alternatives into efficient and inefficient ones, SMAA-2 is an ex-ante tool for supporting multiple criteria decision-making. Both methods use a kind of value function where the importance of criteria is modeled using weights. Unlike many other methods, neither DEA nor SMAA-2 requires decision-makers’ weights as input. Instead, these so-called non-parametric methods explore the weight space in order to identify weights favorable for each alternative. This paper introd…

Decision support systemStochastic multicriteria acceptability analysisMathematical optimizationInformation Systems and ManagementGeneral Computer ScienceOperations researchWeight spaceStochastic efficiencyExtension (predicate logic)Management Science and Operations ResearchIndustrial and Manufacturing EngineeringModeling and SimulationBellman equationData envelopment analysisEnvelopmentMathematicsEuropean Journal of Operational Research
researchProduct

On the best Lipschitz extension problem for a discrete distance and the discrete ∞-Laplacian

2012

Abstract This paper concerns the best Lipschitz extension problem for a discrete distance that counts the number of steps. We relate this absolutely minimizing Lipschitz extension with a discrete ∞-Laplacian problem, which arises as the dynamic programming formula for the value function of some e -tug-of-war games. As in the classical case, we obtain the absolutely minimizing Lipschitz extension of a datum f by taking the limit as p → ∞ in a nonlocal p -Laplacian problem.

Discrete mathematicsMathematics(all)General MathematicsApplied MathematicsMathematics::Analysis of PDEsTug-of-war gamesExtension (predicate logic)Lipschitz continuityDynamic programmingLipschitz domainBellman equationInfinity LaplacianNonlocal p-Laplacian problemLimit (mathematics)Lipschitz extensionLaplacian matrixLaplace operatorMathematicsJournal de Mathématiques Pures et Appliquées
researchProduct

Explainable Reinforcement Learning with the Tsetlin Machine

2021

The Tsetlin Machine is a recent supervised machine learning algorithm that has obtained competitive results in several benchmarks, both in terms of accuracy and resource usage. It has been used for convolution, classification, and regression, producing interpretable rules. In this paper, we introduce the first framework for reinforcement learning based on the Tsetlin Machine. We combined the value iteration algorithm with the regression Tsetlin Machine, as the value function approximator, to investigate the feasibility of training the Tsetlin Machine through bootstrapping. Moreover, we document robustness and accuracy of learning on several instances of the grid-world problem.

Learning automataComputer sciencebusiness.industryBootstrappingMachine learningcomputer.software_genreRegressionConvolutionRobustness (computer science)Bellman equationReinforcement learningMarkov decision processArtificial intelligenceMathematics::Representation Theorybusinesscomputer
researchProduct

Stackelberg equilibrium with multiple firms and setup costs

2017

Abstract I provide conditions that guarantee that a Stackelberg game with a setup cost and an integer number of identical leaders and followers has an equilibrium in pure strategies. The main feature of the game is that when the marginal follower leaves the market the price jumps up, so that a leader’s payoff is neither continuous nor quasiconcave. To show existence I check that a leader’s value function satisfies the following single crossing condition: When the other leaders produce more the leader never accommodates entry of more followers. If demand is strictly logconcave, and if marginal costs are both non decreasing and not flatter than average costs, then a Stackelberg equilibrium ex…

Marginal costStackelberg equilibriumEconomics and EconometricsSetup costApplied Mathematics05 social sciencesStochastic gameExistence of the equilibriumSupermodular gamesCournot competitionSettore SECS-P/06 - Economia ApplicataMicroeconomicsQuasiconvex functionNon quasiconcave payoffEntry deterrenceBellman equation0502 economics and businessEconomicsStackelberg competitionMarket powerLimit (mathematics)050207 economicsSettore SECS-P/01 - Economia PoliticaMathematical economics050205 econometrics Journal of Mathematical Economics
researchProduct